Challenges with segmenting intraoperative ultrasound for brain tumours

Objective - Addressing the challenges that come with identifying and delineating brain tumours in intraoperative ultrasound. Our goal is to both qualitatively and quantitatively assess the interobserver variation, amongst experienced neuro-oncological intraoperative ultrasound users (neurosurgeons and neuroradiologists), in detecting and segmenting brain tumours on ultrasound. We then propose that, due to the inherent challenges of this task, annotation by localisation of the entire tumour mass with a bounding box could serve as an ancillary solution to segmentation for clinical training, encompassing margin uncertainty and the curation of large datasets. Methods - 30 ultrasound images of brain lesions in 30 patients were annotated by 4 annotators - 1 neuroradiologist and 3 neurosurgeons. The annotation variation of the 3 neurosurgeons was first measured, and then the annotations of each neurosurgeon were individually compared to the neuroradiologist’s, which served as a reference standard as their segmentations were further refined by cross-reference to the preoperative magnetic resonance imaging (MRI). The following statistical metrics were used: Intersection Over Union (IoU), Sørensen-Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD). These annotations were then converted into bounding boxes for the same evaluation. Results - There was a moderate level of interobserver variance between the neurosurgeons \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[IoU:0.789, \ DSC:0.876, \ HD:103.227]$$\end{document}[IoU:0.789,DSC:0.876,HD:103.227] and a larger level of variance when compared against the MRI-informed reference standard annotations by the neuroradiologist, mean across annotators \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[IoU:0.723, \ DSC:0.813, \ HD:115.675]$$\end{document}[IoU:0.723,DSC:0.813,HD:115.675]. After converting the segments to bounding boxes, all metrics improve, most significantly, the interquartile range drops by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[IoU:37\%, \ DSC:41\%, \ HD:54\%]$$\end{document}[IoU:37%,DSC:41%,HD:54%]. Conclusion - This study highlights the current challenges with detecting and defining tumour boundaries in neuro-oncological intraoperative brain ultrasound. We then show that bounding box annotation could serve as a useful complementary approach for both clinical and technical reasons.


Introduction
Maximal-safe resection of brain tumours is a key pillar of modern neuro-oncology management, improving symptoms, quality of life and overall survival [11].Accurate delineation of a lesion from surrounding normal functional brain tissue remains challenging but is critical to ensure optimal resection.Ultrasound (US) has been employed intraoperatively (intraoperative US iUS) in neurosurgery for 70 years Alistair Weld, Luke Dixon and Giulio Anichini contributed equally to this work Extended author information available on the last page of the article [9].Applied for surgical guidance, as it allows real-time detection, characterisation and outlining of tumours.Unlike Magnetic resonance imaging (MRI) -specifically intraoperative MRI (iMRI) -it integrates easily into the surgical workflow and is relatively affordable.With the availability of high-quality US machines and navigated US co-registered with preoperative MRI and computed tomography (CT) [5,10,13,18,19], its adoption has increased over the last two decades.
However, factors associated with the quality of the modality as well as standardisation of training has effected universal adoption [4].There remains a perceived steep learning curve secondary to limited fields of view with unfamiliar topographical representation, artefacts, the unique visuo-tactile task and the difficulty with gaining experience outside of the intraoperative setting.There are also concerns regarding the imaging accuracy and granularity.These factors can make learning iUS difficult, with the potential to impair tumour and tumour boundary detection with the risk of leaving unintended residuum or causing inadvertent damage.This challenge is further compounded by the inherent great variation in types of brain lesions, their appearances, the degree of infiltration and the intraoperative changes (such as oedema and haemorrhage) which can further confound.
Presently, there have been only a few meta-analyses looking at the impact of iUS in glioma surgery.One pooled series reported an average 77% gross total resection rate in 739 patients undergoing iUS-guided resection (71.9% in HGG compared to 78.1% in LGG) which was comparable to other forms of navigation.A recent meta-analysis of 409 diffuse gliomas compared the accuracy of iUS to the reference standard post-operative MRI.They found that iUS was an effective technique in assessing diffuse glioma resection, with a 72.2% pooled sensitivity and a 93.5% pooled specificity [30].Whilst these results are encouraging the current evidence supports a need to improve iUS accuracy if it is to become part of the standard of care in brain tumour surgery.Trials assessing the role of US in neurosurgery, such as the randomized controlled trial Functional and Ultrasound-Guided Resection of Glioblastoma (FUTURE-GB), and refining of US techniques are therefore needed [23].
There are three aims of this study.Firstly, to assess whether tumour boundary detection on iUS is challenging, we measure interobserver variation between regular iUS operators in segmenting US images of brain lesions.Secondly, we model the pixel intensities of the segmented tumour boundaries to mathematically model the clarity and blurrinessfactors relating to issues of low resolution and low signal-tonoise ratio -of tumour boundaries.Finally, we evaluate the interobserver variation of bounding boxes to assess whether this annotation method has a role as an alternative, complimentary simplified method for outlining lesion margins.Additional evaluation is performed to determine if these bounding boxes can be used as a guide to improve the accuracy of the segmentation.
To improve the utility of iUS in neurosurgery, the understanding of the limitations of tumour-margins delineation capabilities needs to be understood and measured.With standardised training, new supporting techniques as well as the development of new tools required to reduce uncertainty and error.By formally addressing the issue of tumour segmentation error and uncertainty, we hope to highlight a fundamental challenge with iUS that has hindered universal adoption.Until now, this issue has largely been an implicit challenge recognised by experienced iUS operators.

Materials and methods
A preliminary study was conducted using 4 annotators (AN) experienced with iUS -a neuroradiologist and 3 neurosurgeons -to determine the foundation of our hypothesis.The neuroradiologist and the three neurosurgeons are all post-training doctors.All clinicians involved have extensive research backgrounds and familiarity with segmentation tools and protocols.Specifically: Neuroradiologist -10 years of US and neuroimaging experience; An1 -10 years of neurooncology experience with 10 years of ≈ 2-3 cases per week using iUS; An2 -8 years of neuro-oncology experience with 6 years of ≈ 1 case per week using iUS; An3 -9 years of neuro-oncology experience with 9 years of ≈ 2 cases per week using iUS.The order of the annotators has been randomised.

Data
The dataset consists of 30 images, from 30 patients, taken during brain surgery at Imperial College NHS Trust London.The images were retrospectively selected by the neuroradiologist from cine clips of US sweeps that captured the entire tumour.Images with a field of view which covered the boundaries of the tumour and at least 2cm of surrounding normal brain were selected with reference to the preoperative MRI (Magnetom Vera 3T, Siemens) to ensure accuracy.All images were captured before tumour resection.For patient information please see Table 1.Images were acquired using a Canon i900 US machine (Canon Medical Systems, Japan) with the 8MHz i8MCX1 microconvex probe, and were stored using the Digital Imaging and Communications in Medicine (DICOM) format.For all images, the dynamic range was fixed at 70dB.The aperture was set to either 1 or 3.The power level was set to either 5 or 7.The gain varied between 79 to 98.The maximum depth was varied between 5cm to 11cm, with the depth focus between 1.8cm to 8.1cm.All images are of size 960 × 1280.
The study had full local ethical approval by the HRA and Health and Care Research Wales (HCRW) authorities.Study title -US-CNS: Multiparametric Advanced Ultrasound Imaging of the Central Nervous System Intraoperatively and Through Gaps in the Bone, IRAS project ID: 275556, Protocol number: 22CX7609, REC reference: 22/WA/0259, Sponsor: Research Governance and Integrity Team (RGIT).

Annotation protocol
The boundaries of tumours were annotated, for both segmentations and bounding boxes, using 3D Slicer (5.4.0) [8].The annotations provided by the neuroradiologist were made using the benefit of the full US dataset, cross-registration with the preoperative MRI and patient metadata; and we define these as the reference standard/ground truth for this study.The other annotators were given only the individual 2D US images -these annotations were used to evaluate the consistency and accuracy of the annotation capability, comparing inter-neurosurgeon variation and dissimilarity against the annotations of the neuroradiologist.Whilst this is unlike normal clinical practice (where the MRI and full real-time US would be available and employed by the operator) the aim of this study is to assess the ability of B-mode US to delineate brain lesion margins in isolation.The utility of bounding boxes as a guide to refine tumour boundary segmentation was then assessed by An1 repeating their segmentations 3 months later (to mitigate bias) with the reference standard bounding boxes produced by the neuroradiologist overlaid.
The bounding box annotations are created by taking the maxi-mum and minimum, x and y coordinates from the annotators' segments, fitting a box around the outer limits of the segment -as opposed to re-annotating.

Statistical analysis
Three statistical metrics were used to compare the similarity between annotations.Within the mathematical framework, let A and B represent two annotations from the same image i.e.A could be the neuroradiologist's annotation and B, one of the neurosurgeon's annotations.The first two metrics that we define are methods to quantify the degree of overlap -indirectly the degree of similarity in the shape and volume -between A and B. The intersection over union (IoU) [26] I oU = |A∩B| |A∪B| , which is defined as the set of image coordinates occupied in the intersection of the two segmentations, divided by the union of the two segmentations, the total set of image coordinates occupied by both segmentations.The Sørensen-Dice Similarity Coefficient (DSC) [12]

DSC = 2|A∩B|
|A|+|B| , which is defined as twice the intersection of the two segmentations divided by the cardinalities of the two segmentations.For both IoU and DSC, the closer the value to 1 the better the score.
For evaluation of the uncertainty of the boundary delineation, the contours of the segmentations were compared using the Hausdorff Distance (HD) [14] H D(B c , A c )), which determines the maximum Euclidean distance ( pi xels) between all closest point pairs between the two contours sets -the subscript c denotes the contours of the corresponding segmentations, where the contours are the lines intersecting the endpoints of the segmentations.For this metric, the closer the value is to 0 the better.
This analysis was performed using Python, pynrrd [7] for reading the segmentation files and scikit-image [33] and SciPy [34] for the quantitative analysis.

Tumour border pixel dispersion substudy
To assess the visible distinguishability of the tumour from the surrounding normal tissue, the dispersion of pixel intensities around the neuroradiologist's tumour margins are evaluated.By measuring the pixels along the segmentation's contour plus a 10 pixels border, perpendicular to, and on both sides of the contour.To calculate the properties of the distributions, a local maxima peak-finding method is implemented.Where one peak = unimodal, two = bimodal and three or more = other (could be either multimodal or uniform).Although the overlap scores indicate general similarity between the annotations, there are still inconsistencies, which when considering the precision required for tumour resection, this marginal difference can be considered impactful.The average HD, on the other hand, is a significant result.Showing that there is frequent disagreement on at least one point along the tumour boundary.

Tumour border pixel dispersion substudy
First, we provide a simplified example to illustrate how smoothing affects pixel distributions along a binary boundary (perfect separability).In Fig. 1 a binary circle is created and both Gaussian and box filters are applied with increasing amplification.For the case of the binary circle without smoothing, the boundary is perfectly defined and as such the pixel distribution is bimodal.However, as the smoothing factor increases, the distribution tends towards uniformity or multimodal and under severe blurring, becomes unimodal.
From evaluating the tumour boundaries, from the 30 images, 27 were classified as unimodal and 3 as other Fig. 2.This result is strong evidence of the severity of the pixel intensity variation along the tumour boundary, which provides mathematical evidence for the ambiguity in defining the discrete, definite boundary points.

Neuroradiologist-Neurosurgeon annotation variance
The results of the similarity comparison between the reference standard annotations produced by the neuroradiologist  (which benefited from correlation with the preoperative MRI and the full US dataset) and the neurosurgeons is tabulated in Tables 2 and 3, and visualised using a box and whiskers plot in Fig. 3.This showed a moderate interobserver variance between the reference standard segmentations' and the annotations performed by the neurosurgeon on the single slice B-mode images alone, highlighting the potential limitations and uncertainties of isolated B-mode in defining tumour boundaries.
Further evaluation of the bounding box proposal is conducted by measuring individually for each image, the percentage of the neuroradiologist's segmentation contained within the neurosurgeon's corresponding bounding box.The average results per annotator are -An1:98.387%,An2:98.833%,An3:99.052%.From the results, it can be con- cluded that the bounding box approach is usable for localising the entire tumour mass, and (Fig. 4) is a suitable method for reducing inter-observer annotation variance.Sample images are displayed in Figs. 5 and 6.An assortment of different sources of boundary aleatoric uncertainty are shown in Fig. 5. Including images containing fuzzy borders and continued hyperechogenicity extending beyond the tumour.Shown in Fig. 6 are example cases where using bounding boxes has substantially improved the annotation similarity.

Improving segmentation accuracy using overlaid bounding boxes as a guide
In the substudy looking at the impact of overlaid reference bounding boxes on An1 segmentation accuracy, we found a substantial improvement in segmentation similarity between the neurosurgeon and reference standard.What is highlighted is the greatly reduced median score and IQR when using the bounding box method.IoU is Intersection Over Union, DSC is Sørensen-Dice Similarity Coefficient, HD is Hausdorff Distance and IQR is Interquartile Range result shows that the bounding box can be used as a visual anchor to minimise the uncertainty when segmenting.

Historical background and clinical landscape
Several intraoperative imaging devices have been developed to help neurosurgeons localizing the tumour during surgery.Aside from CT scan and MRI scans, introduced in clinical practice during the '70s and in the '80s, neuro-navigation frameless devices were introduced during the '90s, and they were the first tools allowing intraoperative localization of the tumour [6] , although they rely on pre-operative images rather than real time acquisition.The challenging problem of having a real time tool for tumour location and visualization has been partially sorted by the gradual introduction of iUS in the armoury of neurosurgical equipment [5] .Interestingly, in several cases the signal obtained from iUS was different when compared to that obtained using traditional imaging technologies (CT or MRI scan), thus sparking attention on the fact that the iUS could be used not only as an intraoperative aid, but also as a complementary tool for those lesions of unclear nature or margins [5] .Moreover, coupling of the iUS with MRI based neuro-navigation has greatly improved the possibility of midline shift adjustment during surgery [16,17,19] and has been proven useful to improve tumour demarcation when close to eloquent areas [25] .In more recent years, further technological advancement has led to the development of 3D iUS [2,28,31,32] .The 3D reconstruction is automatically generated by the neuro-navigation software after an intraoperative acquisition through a single spatial plane.The images can be integrated with doppler angiography when required, so that vessel encasement by an intra-cranial mass or aneurysm can also be detected [24,29] .Recent research has also been focusing on the possibility of integrating iUS with contrast [1,36].

Findings, challenges, and future perspectives
In this study, we highlight the specific challenge of tumour detection and tumour boundary delineation in cranial iUS.
Here we demonstrate that there remains moderate to high interobserver variation in the identification and segmentation of tumours on B-mode images acquired on a modern, current-generation US scanner, between four individuals with experience in iUS-guided brain surgery.Three broad elements likely contribute to this variance, 1) the specific qualities of brain tumours, 2) technical ultrasound factors and 3) operator influences.
Firstly, there are inherent features of brain tumours themselves which can make them difficult to delineate.Gliomas, in particular, are well-known to be infiltrative tumours, meaning that they spread cancer cells beyond their obvious radiological margins [20,27].Moreover, both high-grade gliomas and low-grade gliomas often show a degree of surrounding oedema which is, to this day, challenging to interpret in terms of differential diagnosis between reactive inflammatory tissue or actually infiltrated brain, even using more established neuroimaging tools such as MRI.Moreover, the intraoperative imaging quality varies based on the stage of surgical resection [18,32] .Superficial, small sized lesions are typically very well visualized by the acquisition and during resection.However, even modest to moderate amount of bleeding causes a visible artifact that can hamper surgical view beyond the limits of resection.To detect residual tumour, it is crucial to perform accurate haemostasis and remove all haemostatic material to the surgical cavity.
One of the main points of discrepancy in the present series was the definition of tumour margins.For example, in some cases, the lack of clarity over whether the highlighted hyperechogenicity of the tissue was caused by a continuation of Fig. 4 Inter neurosurgeon annotation variance.From top to bottom -ID 14, ID 16, ID 18. ID is the image index the tumour mass or just reactive oedema, introduces significant heteroscedastic aleatoric uncertainty when defining the exact margin.This challenge is further emphasised by how there remains no reference standard imaging technique that absolutely defines tumour extent, nor is it usually possible to remove tumours en bloc in intra-axial neurosurgery, precluding accurate histological correlation of tumour margins.
Secondly, there are unique challenges that US presents.There are numerous ways that an ultrasound image can be altered, including changes in settings (such as gain and frequency), ultrasound machine, probe type, probe contact and probe angle.In most cases, several of these parameters need to be intentionally tailored to the particular tumour being imaged.For instance, using a low-frequency probe, which has a trade-off in reduced spatial resolution, to visualise a deep tumour versus a high-frequency probe, with high spatial resolution, to image a superficial cortical tumour.This wide range of US options can greatly alter the final image creating another source of variance and in turn, aleatoric uncertainty which is arguably greater than typically seen with other established imaging modalities such as CT and MRI.
US is also vulnerable to several unique artefacts, such as acoustic shadowing [35] and acoustic enhancement, which can alter and obscure the image creating a further source of uncertainty.These issues could be mitigated somewhat by the establishment of a standardised protocol for US settings and image acquisition.However, even then, it would be impossible to fully account for all scenarios due to the wide spectrum of tumours and anatomical locations.The potential for confounding artefacts and uncertainty regarding tumour boundaries further increases as surgery progresses due to increased deformation, oedema and potential obscuring blood products.This uncertainty could also be reduced with operator experience, which links to operator factors which are the final source of variance.Currently, neurosurgical training in US is predominantly experiential based on exposure to live cases in theatres.Whilst this is an essential aspect of learning a new surgical skill this can greatly prolong the learning phase due to a relatively low rate of exposure to the imaging technique as it necessitates an intraoperative setting with a craniotomy window.This is in contrast to CT and MRI, which neurosurgeons are much more comfortable with interpreting, owing, in part, to these being readily available and performed regularly on most patients.
There are several complementary ways that iUS accuracy could be improved and the steep learning curve could be flattened and shortened.This includes the use of advanced multimodal US (including contrast-enhanced US) and navigated US.In addition, dedicated courses employing US phantoms with brain tumour models can greatly help, by providing both applied formal training in US theory in addition to hands-on time with US scanning.In all cases, however, these approaches still require significant time investment and training.In this context, automated, computer-assisted, detection In contrast, development in automated segmentation of iUS images is in its infancy with no applications yet available.There are several reasons for this.As illustrated in this study it is challenging to establish a ground truth dataset using manual segmentation due to high interobserver varianceespecially as most published data usually contains only individual frames or volumes.Furthermore, unlike MRI there is a paucity of neuro-oncology iUS imaging datasets.This is likely due to the relatively low number of iUS scans acquired, the greater logistical challenges with saving and downloading scans, the potentially large file sizes of video acquisitions, the greater variation of iUS across sites and the need for highly experienced annotators to perform the time-consuming segmentations.Considering these many boundaries, we are far off the realisation of a reliable system to rapidly automatically and accurately segment iUS brain tumour images.To bridge this gap, here we assess the utility of bounding boxes as an additional complementary tool for simplified tumour detection and delineation.
Unsurprisingly, we found much lower interobserver variation when using bounding boxes to define tumour location and margins compared to segmentations.The advantage of our proposed use of bounding boxes is a step in the direction of overcoming the above-mentioned challenges.From Fig. 6 Examples for where the segments have large dissimilarity whilst the bounding boxes don't.The left images are the original images, the middle from the neuroradiologist and the right from a neurosurgeon.From top to bottom -ID 11 AN 2, the bottom from ID 26 AN 1. ID is the image index a clinical perspective, the bounding box would be useful for training purposes and immediate identification of the tumour mass.While this system is expected to be lower in specificity, the high sensitivity should assist inexperienced surgeons in detecting tumours and providing an area to focus on.Further, the whole signal change (fuzzy margin) would be included in the bounding box, thus making sure that there is no missing tumour from the targeted area.The process of annotation is also much quicker for bounding boxes although this will be annotator-dependent, from our experience, bounding box annotation may take as little as 1/3 of the time of segmentation, reducing the manual labour cost.
In computer vision and AI, segmentation methods will typically define, assume that the segmentation task can be framed as separating an image into sub-regions with defined and complete boundaries.For example, explicitly through the definition of a mathematical optimisation framework, or implicitly by training a neural network on common datasets.Because of this, the task of semantically segmenting brain tumours in US becomes uniquely difficult.From an engineering perspective, there are a large number of benefits to using bounding boxes.First and foremost, the reduced annotation complexity should facilitate the collection of large datasets.Accelerating technical and clinical research into this topic.For technical development, the primary benefit is the reduced complexity of the estimation task, which should lead to highly accurate systems.The bounding boxes can also be used for different tasks such as: representing the segment as a probabilistic heat map [22] to account for the tumour infiltration [20,27], prompting large/powerful models such as Segment Anything Model (SAM) [15], tracking algorithms [3].

Conclusion
What has been highlighted in this paper is our thoughts on the challenges of segmenting brain tumours in 2D US images, with a preliminary study conducted to corroborate our hypothesis.For future work, larger curated and consensus annotated datasets of iUS brain tumour images and volumes are needed to develop more accurate computer assisted boundary detection tools.This is likely to only be achieved through multi-site collaboration and pooling of data.
There are a few limitations of this study which we hope to be addressed in future work.Firstly, the number of annotated cases was small which is why we have withheld from identifying correlates -such as whether certain tumour types are more accurately segmented.Secondly, only single slices were used for segmentation as opposed to volumes.This is unlike the real-world use of iUS where assessment of boundaries is based on live 3D sweeps of tumours and adjacent anatomy, plus often cross-correlation with preoperative MRI, both of which would help refine the accuracy of segmentations.However, we do comment that even though 3D information may be available, decisions are still biased by what is visible, which invariably would still be a 2D slice.Extending from this, there remains the recurring issue in neuro-oncology of there not being an accepted gold-standard ground truth for tumour boundaries and our use of integrated multimodal MRI and US to create the reference standard segmentations has to be an accepted compromise.Finally, whilst bounding boxes may serve as an efficient method to improve the detection of tumours, this is at the expense of specificity, which is important for the prevention of inadvertent removal of normal, functional brain tissue.

Fig. 1
Fig.1The top row/example shows the effect on the boundary of a binary circle when Gaussian blurring is applied -using a kernel density estimate plot.The bottom row/example shows the same but using a box

Fig. 2
Fig. 2 Top is a plot of all tumour boundary pixel intensities distributions.Blue highlights unimodal distributions and red highlights other.Bottom is an example of tumour boundary is shown using ID 009.ID is the image index

Fig. 3
Fig. 3 Box and Whisker plots of the annotators IoU, DSC and HD scores on the 30 images -red = segmentation, black = bounding box.What is highlighted is the greatly reduced median score and IQR when using the bounding box method.IoU is Intersection Over Union, DSC is Sørensen-Dice Similarity Coefficient, HD is Hausdorff Distance and IQR is Interquartile Range

Fig. 5
Fig. 5 Uncertainty caused by unclear boundary.The left images are the original images, the middle from the neuroradiologist and the right from a neurosurgeon.From top to bottom -ID 13 An3, ID 07 An3, ID 18 An2, ID 19 An2.ID is the image index

Table 1
Patient information.

Table 2
Shown are the IoU and DSC similarity results.The closer the value to 1 the more similar the annotations are.IoU is Intersection Over Union, DSC is Sørensen-Dice Similarity Coefficient

Table 3
Shown are the HD similarity results.The closer the value to 0 pixels the better aligned the annotation margins are.HD is Hausdorff Distance